Adaptive Mutual Supervision for Weakly-Supervised Temporal Action Localization
نویسندگان
چکیده
Weakly-supervised temporal action localization aims to localize actions from untrimmed long videos with only video-level category labels. Most previous methods ignore the incompleteness issue of Class Activation Sequences (CAS), suffering trivial detection results. To tackle this issue, we propose a novel Adaptive Mutual Supervision (AMS) framework two branches, where base branch detects most discriminative regions, while supplementary localizes less regions through an adaptive sampler. The sampler dynamically updates inputs for using sampling weight sequence negatively correlated CAS branch, thus encouraging underestimated by branch. promote mutual enhancement between further construct location supervision. Each adopts pseudo-labels generated other as By alternately optimizing branches multiple iterations, progressively complete regions. Extensive experiments on THUMOS14 and ActivityNet1.2 demonstrate that proposed AMS method significantly outperforms state-of-the-art methods.
منابع مشابه
Towards Weakly-Supervised Action Localization
This paper presents a novel approach for weakly-supervised action localization, i.e., that does not require per-frame spatial annotations for training. We first introduce an effective method for extracting human tubes by combining a state-of-the-art human detector with a tracking-by-detection approach. Our tube extraction leverages the large amount of annotated humans available today and outper...
متن کاملWeakly Supervised Action Localization by Sparse Temporal Pooling Network
We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks. Our algorithm learns from video-level class labels and predicts temporal intervals of human actions with no requirement of temporal localization annotations. We design our network to identify a sparse subset of key segments associated with target actions in a video usin...
متن کاملConnectionist Temporal Modeling for Weakly Supervised Action Labeling
We propose a weakly-supervised framework for action labeling in video, where only the order of occurring actions is required during training time. The key challenge is that the per-frame alignments between the input (video) and label (action) sequences are unknown during training. We address this by introducing the Extended Connectionist Temporal Classification (ECTC) framework to efficiently e...
متن کاملWeakly Supervised Action Detection
Detection of human action in videos has many applications such as video surveillance and content based video retrieval. Actions can be considered as spatio-temporal objects corresponding to spatio-temporal volumes in a video. The problem of action detection can thus be solved similarly to object detection in 2D images [3] where typically an object classifier is trained using positive and negati...
متن کاملAction Recognition by Weakly-Supervised Discriminative Region Localization
We present a novel probabilistic model for recognizing actions by identifying and extracting information from discriminative regions in videos. The model is trained in a weakly-supervised manner: training videos are annotated only with training label without any action location information within the video. Additionally, we eliminate the need for any pre-processing measures to help shortlist ca...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Multimedia
سال: 2022
ISSN: ['1520-9210', '1941-0077']
DOI: https://doi.org/10.1109/tmm.2022.3213478